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Abstract 



This paper presents the theory behind a model for a two- stage analog 
network for edge detection and image reconstruction to be implemented 
in VLSI. Edges are detected in the first stage using the multi-scale veto 
rule, which states that an edge is significant if and only if it passes a 
threshold test at each of a set of different spatial scales. The image is 
reconstructed in the second stage from the brightness values adjacent to 
the edge locations. Among the key features of this model are that edges 
are localized at the resolution of the smallest spatial scale without having 
to identify maxima in brightness gradients, while noise is removed with 
the efficiency of the largest scale. There are no problems of local minima, 
and for any given set of parameters there is a unique solution. Images 
reconstructed from the brightnesses adjacent to the marked edges are very 
similar visually to the originals. Significant bandwidth compression can 
thus be achieved without noticeably compromising image quality. 
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1. Introduction 



In a real-time system, it is desirable to find edges, or sharp changes in the image 
brightness function, quickly and accurately. Speed is necessary to save time which 
can be better spent on more computationally intensive processes, such as feature 
matching, which use the edges. Accuracy is needed to supply these processes with 
reliable input. Accurate edge detection means being able to selectively ignore gradi- 
ents in the brightness function caused by high spatial-frequency features attributable 
to noise, while marking those caused by high frequency features such as corners and 
junctions. It also requires that the edges be well localized to the contours of features 
in the image which generate them. Noise can be removed by applying a linear lowpass 
smoothing filter. However, this has the effect of attenuating all high frequency com- 
ponents indiscriminately and introducing uncertainty in edge locations. Non-linear 
methods, such as median filtering, which preserve important edges and remove noise 
have been in existence for some time. These methods generally require more compu- 
tation than linear filtering, however, and cannot be implemented by convolution. Of 
particular interest to designers of real-time systems are methods which can be built 
in silicon. One recently developed technique designed in analog VLSI is the resistive 
fuse network invented by Harris [8] based on the weak membrane model of Blake and 
Zisserman [3]. In this paper we propose another computational model which can also 
be implemented in analog VLSI and which overcomes some of the disadvantages of 
the weak membrane model. 

The multi-scale veto, or MSV, model is similar to the weak membrane in that it 
assumes an image can be approximated by a collection of piecewise smooth functions. 
Edges are 'break points', i.e., locations where the brightness function is not required 
to be smooth. The MSV model differs from the weak membrane, however, in two 
respects. It does not reconstruct the image from all of the data, but only from the 
brightness values of pixels on either side of the edges. Second, the networks used for 
edge detection and image reconstruction are physically distinct. As a result, problems 
associated with the non-convexity of the weak membrane are avoided. 

The MSV model derives its name from the method it uses for detecting edges. 
Edges are defined as loci of sharp changes in the image brightness function which are 
significant over a range of spatial scales. An important aspect of the MSV model is 
that edges do not necessarily correspond to local maxima in the magnitude of the 
gradient. It therefore responds not only to step changes in brightness, but also to 



strongly shaded surfaces which do not always give rise to well defined maxima in 
the gradient. On a discrete two-dimensional array edges occur between two pixels 
(nodes). The spatial scale is determined by the space constant of the smoothing 
network to which voltage sources proportional to the sampled brightness values at 
each pixel are connected. Differences are computed between the smoothed voltages 
at neighboring nodes of the network and compared to a threshold which is also a 
function of scale. In applying the multi-scale veto rule, two or more scales are used, 
and all must agree on the presence of a significant difference between two nodes before 
an edge is marked. If at any scale the difference between the smoothed brightnesses is 
below threshold, the edge is vetoed. As will be discussed in Section 4.1, this method 
allows edges to be localized at the resolution of the smallest scale, while noise is 
removed with the efficiency of the largest scale. 

Two points, which are discussed later in detail, are significant to note about the 
MSV edge detection network: 

• It does not require computation of second differences, and 

• All of the difference operations and threshold tests at different scales can be 
performed on the same physical network. 

Both points represent a considerable savings in circuitry, a crucial consideration 
if the network is to be designed to work with large image arrays. 

The second piece of the MSV model is the reconstruction network. While this 
circuit performs nothing more complicated than interpolation from the brightness 
values next to the marked edges, it is significant that the images reconstructed in 
this manner are very similar visually to the originals. Since only a fraction of the 
original data points are needed for reconstruction — typically from 15-45% of the 
image, depending on the amount of detail in the scene — this means that we can 
save storage and transmission bandwidth by only encoding these values. Combined 
with existing compression methods such as run-length and Huffman coding, the total 
savings may be significant. 

This paper is organized as follows: In the next section we review related work 
in edge detection, multi-scale methods and image reconstruction. In Section 3 we 
describe the circuit models of the edge detection and reconstruction networks, and in 
Section 4 we discuss performance issues and show results from computer simulations 



on some test images. In the last section we compare the MSV model to the weak 
membrane and characterize the differences between the computations they perform. 



2. Related Work 



2.1. Edge detection and the use of multiple scales 

As explained by Torre and Poggio [22], the numerical differentiation of images is 
an ill-posed problem that must be regularized in order to obtain a stable solution. 
The regularization function in this case takes the form of a smoothing filter which 
must be applied before differentiation. In most work in computer vision, edges are 
defined to be the loci of maxima in the magnitude of the smoothed brightness gradient 
and can be detected from zero- crossings in the second derivative. This is the basis on 
which many edge and line detectors, such as the Marr-Hildreth Laplacian-of- Gaussian 
(LOG) filter [17], the Canny edge detector [4], and the Binford-Horn line finder [9], 
have been designed. 

As stated in the introduction, isotropic smoothing filters such as the Gaussian 
have the disadvantage that they smooth away important features as well as noise. 
Smoothing can displace points of maximum gradient, such as around the cusp of a 
brightness 'corner', or remove them altogether. Many efforts have therefore focused 
on developing more selective, edge- preserving smoothing methods. One possibility 
is non-linear filtering. The median filter [7], for example, has often been used in 
image processing because it is particularly effective in removing impulse, or 'salt- 
and-pepper 1 , noise. 

Another approach put forward in recent years is the idea of edge detection, or 
more precisely image segmentation, as a problem in minimizing energy functionals. 
The first proposal of this nature was the Markov Random Field (MRF) model of 
Geman and Geman [6]. In an MRF the minimum energy state is the maximum a 
posteriori (MAP) estimate of the energies at each node of a discrete lattice. The MAP 
estimate corresponds to a given configuration of neighborhoods of interaction. 'Line 
processes' are introduced on the lattice to inhibit interaction between nodes which 
have significantly different prior energies, thereby maintaining these differences in 
the final solution. Mumford and Shah [18] studied the energy minimization problem 



reformulated in terms of deterministic functionals to be minimized by a variational 
approach. Specifically, they proposed finding optimal approximations of a general 
function d(x,y), representing the data, by differentiate functions u(x,y) that are 
minimizers of 

E(u, T) = fi 2 J f (u- dfdxdy + / / | Vu\ 2 dxdy + v\T\ (1) 

where T is a closed set of singular points, in effect the edges, at which u is allowed 
to be discontinuous. Blake and Zisserman [3] referred to (1) as the 'weak membrane' 
model, since E(u,T) resembles the potential energy function of an elastic membrane 
which is allowed to break in some places in order to achieve a lower energy state. 
They derived a continuation method, which they referred to as the Graduated Non- 
Convexity (GNC) algorithm, to minimize (1) iteratively. 

The weak membrane model was one of the first methods to be implemented in 
analog VLSI. Digital circuits for performing Gaussian convolution and edge detection 
began appearing in the early 80's [1,11]. The possibility of performing segmentation 
and smoothing with analog circuitry, however, did not seem practical until the prob- 
lem had been posed in terms of a physical model. Harris [8] invented the first CMOS 
resistive fuse circuit for minimizing (1) on a discrete grid. A resistive fuse is a two- 
terminal non-linear element which behaves as a linear resistor over a certain voltage 
range, but transforms into an open circuit if the voltage across its terminals becomes 
too large. 

The issue of scale arises in edge detection because of the tradeoff between accurate 
localization of features and sensitivity to noise. Since important features generally 
occur over a range of spatial scales, many methods have been based on the use of 
information at multiple scales. Marr and Hildreth first proposed finding edges from 
the coincident zero-crossings of different sized LOG filters. Witken [23] introduced 
the notion of scale-space filtering, in which the zero- crossings of the LOG are tracked 
as they move with scale changes. In the weak membrane model, there are two pa- 
rameters to specify which, in a sense, determine the scale: ^/, which controls the 
smoothness of the fitted solution u(x,y), and v, which determines the penalty as- 
signed to the discontinuities. Richardson [21] developed a scale- independent iterative 
algorithm for minimizing an energy formulation similar to (1). In each iteration, 
the variational problem is solved for some input image, d(x, y), and some value of ft 
and v. The result is that feature boundaries apparent at the coarsest scale defined 



by the initial values of /i and v are localized with the resolution of the finest scale 
used in the last iteration. Small features, however, are not detected because they do 
not generate discontinuities at the coarse scale and hence are smoothed away. The 
principle applied in Richardson's algorithm is very similar to that of the multi-scale 
veto rule. The MSV model, however, does not involve solving a variational problem. 

The MSV model differs from other edge detection methods in that it does not 
define edges as points of maximum gradient, and hence does not require second 
derivative operators. By defining edges as the loci of significant abrupt changes in the 
image brightness function, it detects edges generated by features which generate step 
changes in brightness, as well as those generated by features such as shaded surfaces 
that do not necessarily give rise to maxima in the gradient. The MSV model is similar 
to the weak membrane in that it assumes the image can be well-approximated by a 
set of piecewise smooth functions whose boundaries are the edges. Multiple scales 
are used in order to ensure that the differences measured between neighboring pixels 
are due to spatially significant features and not to noise. As will be discussed further 
in Section 4.1, the method allows good localization of features because, unlike the 
points of maximum gradient, points where the brightness differences are significant 
will not move with smoothing. 

The edges produced by the MSV model are not as 'refined' as those produced by 
more complex methods such as Canny's edge detector [4] or Richard's CARTOON 
algorithm [20]. This is in part due to the way edges are defined, and in part due 
to the need to make the circuitry as simple as possible in order to minimize silicon 
area. There is no room for contour filling-in or texture edge removal. Our contention 
is that the edges produced by the MSV network are nonetheless functionally useful. 
We will demonstrate their usefulness in conjunction with the reconstruction network, 
and we believe that they will prove to be sufficient as well for other early vision tasks 
such as primitive feature matching. 



2.2. Image Reconstruction 

In the weak membrane, the functions u(x,y) which minimize (1) given the dis- 
continuity set, T, result from smoothing all the data with a filter of scale 1/^, with 
the restriction that smoothing is inhibited across edges. In the MSV model the re- 
constructed image is generated by interpolation from the brightness values adjacent 



to the marked edges, and hence uses only a fraction of the original data. Before 
continuing, we mention briefly some other methods for reconstructing images from 
sparse data points. 

A significant amount of work in communications theory has been devoted to the 
problem of reconstructing signals from their zero-crossings. An often cited theo- 
rem by Logan [15] is that almost all bandpass one- dimensional signals of bandwidth 
less than one octave are uniquely specified by their zero-crossings. Curtis and Op- 
penheim [5] extended Logan's theorem to two dimensions and showed that any real 
two-dimensional doubly-periodic bandlimited function /(z, y) is uniquely specified to 
within a constant scale factor by its zero- crossings, or its crossings of an arbitrary 
threshold. The number of zero-crossings needed to specify f(x, y) may be large, how- 
ever, and their method is not likely to be practical for reconstructing large images 
with significant high frequency components, since it requires precise knowledge of the 
zero- or threshold-crossing locations. 

One well-known example of an instance where an image can be reconstructed from 
sparse data is the case of Mondrian patches, first used by Land and McCann [13] to 
demonstrate their theory of the computation of lightness. The human visual system is 
very good at determining the reflectance of an object, under a variety of illuminating 
conditions. Land and McCann showed that one could recover, to an arbitrary scale 
factor, the reflectances of Mondrian patches by measuring the ratio of brightnesses at 
each step change on a closed path around the image. Horn [10] later showed how the 
same computation could be performed on a parallel network by first computing and 
then thresholding the Laplacian of the logarithm of brightness. More recently, Blake 
[2] suggested a modification to Horn's algorithm by having the threshold operation 
depend on the magnitude of the gradient rather than the Laplacian of the logarithm 
of brightness. In a sense, the MSV model can be considered as an extension of 
these algorithms; although it is the original brightness function, and not surface 
reflectance which is being recovered. Algorithms for the computation of lightness 
first showed that under certain circumstances it is possible to regenerate an image 
from the differences in (log) brightness across patch boundaries, where there is a step 
change in brightness. In the MSV model, we show that an image can be recovered 
from the brightnesses adjacent to edges under more general conditions. 



3. Circuit Models 



In this section we describe the circuit models for the edge detection and recon- 
struction networks. As the actual circuit is currently in the design phase, implemen- 
tation issues will not be discussed in this paper. 



3.1. Edge Detection 

The fundamental principle of this network is the multi-scale veto rule for detecting 
significant changes in the image brightness function. This rule states that an edge 
exists between neighboring pixels if and only if the change in brightness between 
them is significant over a range of spatial scales. The scales are determined by the 
space constants of isotropic smoothing filters applied to the entire image. Differences 
are computed between the smoothed values at neighboring pixels and compared to 
a threshold which is a function of the scale. If the magnitude of the difference is 
greater than threshold at each scale, an edge is marked. If at any of the scales the 
difference is below threshold, however, the edge is vetoed. 

It is not necessary to build a multi-dimensional network in order to implement 
the multi-scale veto rule. By including time as a dimension, a single smoothing 
network with controllable spatial scale, such as the resistive grid with variable vertical 
resistances shown in Figure 1, can be used. The combined result of the threshold 
tests at each scale is encoded by a capacitor whose charge represents the AND of the 
different tests. The network shown in Figure 1(a) is one-dimensional; however, the 
extension to two dimensions is straightforward. By equating the current through the 
vertical resistors connected to the node voltage sources d t , which are proportional to 
the sampled brightnesses, to the sum of the currents leaving the node through the 
horizontal resistors, one easily arrives at the resistive grid equation: 

Ui - -=£- Y\{u k - Ui) = di (2) 

R h k 

where the subscript k is an index over the nearest neighbors of node i. 

The continuous 2-d approximation to this circuit is the diffusion equation 



u - X 2 V 2 u = d (3) 

with 

A = ,/f (4) 

V Mh 

which is the characteristic length over which an point source input will be smoothed. 

One operational cycle of the MSV network corresponds to sensing an image, 
performing the threshold tests at each scale, and offloading the results. The cycle 
is divided into time intervals with operations controlled by external circuitry. It is 
assumed that the number of threshold tests is small (~5-10) and that the length of 
time they require is short compared to the image acquisition time so that operation 
can proceed at frame rate. In the first interval, corresponding to image acquisition, 
during which the voltage sources d{ are generated, a control signal, P , connected to 
the edge precharge circuit goes high, pre-charging all of the capacitors, C e . At the 
end of the sampling period, P goes low and stays low for the remainder of the cycle. 
In the following intervals, R v is changed to set the value of the space constant. The 
absolute value of the differences between neighboring node voltages are compared to 
a threshold, and the edge capacitors at sites where the tests fail are discharged. The 
final phase of the cycle corresponds to moving the edge charges and brightness values 
neighboring the edge locations onto another circuit where further processing takes 
place. The smallest scale used in the computation may correspond to R v = 0, i.e., 
no smoothing at all, and the largest one may correspond to A >> 1. The values used 
are externally set parameters. 



3.2. The Reconstruction Network 

The reconstruction network, as shown in 1-D in Figure 2, regenerates the image by 
interpolation from the brightness values on either side of the marked edges. Voltage 
sources proportional to the original brightnesses di are switched to the resistive grid 
according to whether or not the node is adjacent to an edge. The control signal 
which closes the switch is logically equivalent to the OR of the states of all the edge 
capacitors adjacent to the node. As seen from equations (2) and (3) with Rh = oo 
(A = ex)), the distribution of voltages on the resistive network at non-edge nodes 
solve a discrete form of Laplace's equation. Along the outer border we impose the 




(a) 1-d multi-scale veto edge detection network. cf t and di+\ are voltage 
sources proportional to the sampled brightnesses. The box labeled EPC is 
the edge precharge circuit shown below. 
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(b) Edge precharge circuit. P is a pulsed clock signal which goes high during 
the image acquisition period. The comparator output is high if \ui — u t -+i| < 
r, where r is a globally specified threshold. The capacitor C e encodes the 
edge location. 

Figure 1: Components of the edge detection network 
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Figure 2: The 1-D reconstruction network. d{ and d;+i are voltage sources proportional 
to the original brightnesses; t/,, t/,+i are the reconstructed brightnesses. Ve{ and Vei+\ 
control switches connecting the sources to the grid. Each is logically equivalent to the OR 
of the capacitor states between the node and its neighbors. 

condition that the current flowing out of the grid, the normal derivative of the voltage, 
is zero. It is easy to see that the solution to the reconstruction network is therefore 
unique and well-defined since there are exactly as many equations as unknown node 
voltages. 

It should be emphasized that several implementational issues are left open in 
presenting this conceptual picture of the reconstruction network. Clearly, the manner 
of setting the switches and charging the voltage sources in the reconstruction network 
is a major design problem whose solution will depend on the application in which the 
network is used. In this paper, however, we would like to focus on what the network 
does, rather than on how it should be built, and demonstrate that the results it 
produces are in fact worth the design effort. 

The idea that the image can be reconstructed by solving Laplace's equation on 
a resistive grid subject to the given boundary conditions is based on the assumption 
that we can model an image as a collection of piecewise harmonic functions. If this 
assumption held exactly, only the brightness values bordering edges, where the func- 
tions are not required to be harmonic, would need to be specified in order to recover 
the image completely. A real image is of course always corrupted by noise and will 
never be exactly harmonic except coincidentally. What we seek to reconstruct is a 
visually acceptable approximation. For the method to work well, the edges, which de- 
termine where the switches are closed in the reconstruction netw r ork, must accurately 
represent locations where the image brightness function deviates significantly from 
harmonicity. This is another reason for not defining edges as local maxima in the 
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magnitude of the gradient, since the brightness function may deviate from harmonic- 
ity without exhibiting a maximum in its gradient. This happens often at junctions 
between the projections of different objects in the scene, as well as in many other 
instances. Marking only the points of maximum gradient would miss these locations, 
with the result that the network would force an interpolated solution between nodes 
which should not otherwise interact. The reconstructed image in this case will not 
be a visually acceptable approximation to the original. 



4. Performance issues: Theory and Results 
4.1. Effect of the multi-scale veto rule 

One way to understand the effect of the veto operation is to consider how it 
relates to the Fourier spectrum of energies contained in an edge. u ince the oper- 
ations are performed on a discrete network, it is appropriate, and simpler, to use 
discrete Fourier transforms. Let x[n], with Fourier transform X(e* u ), denote a one- 
dimensional sequence of sampled brightnesses which has an abrupt change in value 
between n = and n = — 1. We will assume that the dimension of the network is 
>• 1 so that we can approximate frequency, w, by a continuous variable from [— 7r,7r]. 
Let y[n] = x[n] — x[n — 1] denote the difference sequence, and hk[n] denote the con- 
volution kernel of a lowpass filter Hk{e ju; ) of support size k. From [19] the value of 
y[0] is equal to 

y[0] = ^- [* (1 - e-*)X{e*)du (5) 

and the value of the smoothed difference yk[n] at n = is 

y k [0] = h k [0] * y[0] = -L T ff*(0(l - e-*)X(e?«)du> (6) 

Z7T J-ir 

Equations (5) and (6) are valid, even though we are working with two-dimensional 
images, since we are taking differences in only one direction. We can integrate the 2-D 
Fourier transform over the orthogonal frequency and redefine variables accordingly. 
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We are interested in determining under what conditions an edge will be detected 
a * y[®]i given the input sequence x[n] and the smoothing filter h k [n], when the veto 
rule is applied to the difference sequences y[n] and yk[n]. We will examine two special 
cases: one where the input is a step, and one where it is an impulse of the same height 
as the step. These cases correspond to ideal 1-D profiles of a step edge and of an 
isolated noise spike. We want to show how the multi-scale veto rule can discriminate 
between these cases by marking the step edge at the point where the input changes 
abruptly and rejecting the impulse as noise. 

Let r be the threshold used for the unsmoothed differences y[n], and let T k be 
the threshold used for the smoothed difference sequence yk[n]. Suppose x[n] = Au[n] 
where A is a positive constant and u[n] is the unit step. Then y[n] = A6[n], where 
6[n] is the unit impulse; y[0] = A and y k [0] = A/u[0]. If r < A and r k < Ah k [Q], the 
edge will be marked at n = 0. At other values of n ^ 0, y k [n] — Ahk[n], which is not 
in general. It is even possible that for some n, |A/ijfe[ra]| > r fc , but since y[n] = for 
all n ^ 0, the unsmoothed differences will veto the marking of an edge everywhere 
except at n = 0. Clearly, this is the desired result. 

Now suppose that x[n] = AS[n] and y[n] = A(S[n] — 6[n - 1]) so that y[0] = A 
and y k [0] = A(h k [0] — h k [l\). The difference at n = will pass the threshold test for 
the unsmoothed differences if r < A, but will only be marked as an edge if 



Tic 

A > h k [0] - Mi] (7) 

For a discrete smoothing filter, 1 > h k [0] — h k [l] > always, and the value of 
h k [0] — h k [l] will be smaller as k gets larger. Hence, more contrast is needed to mark 
an impulse than a step. This also is a desired result. 

For more general inputs, equations (5) and (6) can be interpreted as meaning that 
an edge will be marked by the multi-scale veto rule if and only if the total energy 
within the passbands of each of the applied filters is significant. Isolated impulse 
noise, whose difference signal does not have significant energy in the low frequency 
end of the spectrum, can be easily removed. If, instead of an impulse, the input signal 
is an extended pulse — as would be the case for the ideal 1-D profile of a line — the 
amount of contrast needed to mark the rising and falling edges of the pulse will also 
depend on the the scale and threshold of the largest filter, but it will rapidly decrease 
as the width of the pulse increases. We use this fact, as discussed below, to adjust 
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the selectivity of the edge detection network for small scale features. 

It is important to note that while the scale and threshold of the largest filter 
determines the effectiveness with which noise and small features are removed, the 
smallest filter determines the accuracy with which edges are localized because it de- 
termines the extent over which a change in brightness will be smeared by smoothing. 
Beyond this extent, the small scale differences will be insignificant and will veto any 
differences at larger scales. 



4.2. Choosing thresholds and scales 

It might seem that the number of free parameters — the different thresholds and 
scale sizes — that need to be specified in order to apply the multi-scale veto rule would 
make the method impractical or even arbitrary. However, there are simple ways to 
choose thresholds and scales based on the types of features which one wants to retain. 
From the resistive grid and diffusion equations, (2) and (3), it can be seen that the 
impulse response functions of the smoothing filters which can be implemented on the 
network are approximately decaying exponentials or Bessel functions. For certain 
values of R v and R h these can be well approximated by even-ordered binomial filters. 
The 1-D binomial filter of order k is given by 



*«]«£§(*)' 



k 
,---„ 



(8) 



For the sake of simplicity we will use b k [n], with k even, to approximate the 
impulse response of the grid since the coefficients of the binomial filter are easily 
computed. Suppose we only want to retain step edges and remove thin lines or 
ridges. This can be arranged using only two scales: unsmoothed, k = 0, so that 
the step will be well-localized, and a second scale with large k so that lines will be 
strongly attenuated. 

As a specific example, suppose r = 10 and k = 16. A step of height 10 and 
extent > 16 will, after smoothing, have a height of 10 x 6 16 [0] = 1.964. Let this be 
the value of r 16 . From (7), a 1-pixel line (an impulse) will pass the veto only if it has 
magnitude 
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Figure 3: Lab scene — original image. 



A> 



1.964 



Mo] - Mi] 



= 90. 



(9) 



For wider lines, it can easily be checked that a 2-pixel line, x[n] = A(u[n] — u[n— 2]) 
will need A > 26. A 3-pixel line, x[n] = A(u[n] - u[n - 3]) will need A > 15, and 
so on. We can increase the selectivity of the veto operation by increasing r or k. 
Conversely, we can make the veto less selective for narrow lines by decreasing r 16 . 
For instance, if r 16 = 1.4 a 1-pixel line would still need a large magnitude (> 64) to 
pass, but a 2-pixel line would pass with A = 19. 



4.3. Simulation results 

Simulated results of the edge detection and reconstruction networks are shown for 
two test images in Figures 3-10. The first set of results is for the 240x320 picture 
of a cluttered lab shown in Figure 3. The second set is for the 256x256 picture of 
David shown in Figure 7. Brightness values in the images are quantized from 0-255. 
In these simulations we approximated the smoothing function of the edge detection 
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(a) Binary edge map 







(b) Reconstructed image 

Figure 4: Binary edge map and reconstruction of lab scene. Thresholds and scale set for 
attenuating thin lines, r = 20, k = 14. Number of data points = 30156 (39% of image). 
RMS difference between original and reconstruction = 10.1 gray levels. 
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(a) Binary edge map 




(b) Reconstructed image 

Figure 5: Binary edge map and reconstruction of lab scene. Smaller second scale used 
to preserve some thin lines, To = 20, k = 10, Number of data points = 32700 (42.6% of 
image). RMS difference between original and reconstruction = 7.5 gray levels. 
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Figure 6: Top: original image. Middle: reconstruction with k = 14. Bottom: reconstruc- 
tion with k = 10. 
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network by even ordered binomial filters since these are good approximations to the 
network point spread function and are easy to generate. We will use the notation 
b k to refer to the 2-D filter generated by the convolution of a horizontally- and a 
vertically-oriented 1-D filter of order k as given by (8). 

In the first test we show the effect of changing the scales on detecting small 
features such as thin lines. Figures 4(a) and (b) are respectively the binary edge 
map and reconstruction from using a relatively high threshold, r = 20, and a large 
second scale, k = 14. The dark points in the binary map indicate where switches are 
closed in the reconstruction network. They are the locations of image pixels which 
are adjacent to an edge and thus always occur in pairs. The image contains a large 
amount of detail, resulting in many edges being marked. Notice, however, some of 
the smaller scale features such as some of the cables hanging from the scope and 
the workbench. Those with relatively low contrast are not picked up by the edge 
detector, and hence, except for a few points which hint at their existence, do not 
show up in the reconstructed image. In the second test, Figures 5(a) and (b), the 
same threshold r was used for the unsmoothed data, but a smaller filter, k = 10, 
was used as the second scale. In Figure 6 the two reconstructed images are shown 
together with the original in order to facilitate comparison. Note how some, though 
not all, of the cables reappear in the reconstructed image. 

In the lab scene there is a lot of clutter, but most of the objects in the image — 
boxes, tables, workstations— are close to having planar or approximately harmonic 
surfaces. It is not too surprising that the reconstructed images are very similar to 
the original. An example of a different type of image is Figure 7 which has little 
clutter and only one major object in the scene, namely a face, which is a very non- 
planar surface. It is interesting to examine how such an image can be reconstructed 
from piecewise harmonic functions and, more importantly, how many data points 
are needed to give a recognizable result. In generating the images in Figures 8-10 
the same scales, k = and k = 10 were used, while the threshold r was varied. 
In the face, most of the information on shape is contained in the variation of the 
brightness gradient. By changing r , we change the number of edges which are 
marked, and therefore change the amount of variation in the brightness gradient of 
the reconstructed image. 

The results are shown in Figures 8-10 where thresholds r of 9, 12, and 15 were 
used. The three reconstructed images are shown together with the original in Fig- 
ure 11. As r increases fewer edges are marked and the reconstructed image appears 
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Figure 7: David — original image. 



correspondingly flatter. Even in the last example, however, with only 12% of the 
original brightness values used for interpolation, the face is still recognizable. Fig- 
ure 10 could be an acceptable reconstruction if we are willing to trade the loss in 
apparent facial shape with the savings in the number of data points that need to be 
specified. 

Although we have only demonstrated it here for the face image, it is true in general 
that, even though the subjective visual quality of the reconstructed image degrades 
as r increases, the result remains recognizable over a wide range of thresholds. For 
the lab scene, which contains more contrast than the face, the range is larger, but 
the same phenomenon is observed. This is an important practical observation since 
it implies that the choice of a specific threshold value is not crucial to the outcome. 
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(a) Binary edge map 




(b) Reconstructed image 

Figure 8: Binary edge map and reconstruction of David with low threshold, r = 9, to 
pick up more detail. Number of data points = 13368 (20.4% of image). RMS difference 
between original and reconstruction = 7.1 gray levels. 
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(a) Binary edge map 




(b) Reconstructed image 

Figure 9: Binary edge map and reconstruction of David with intermediate threshold, 
r = 12, to eliminate some edges. Number of data points = 10190 (15.5% of image). RMS 
difference between original and reconstruction = 9.1 gray levels. 
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(a) Binary edge map 




(b) Reconstructed image 

Figure 10: Binary edge map and reconstruction of David with r = 15. The high threshold 
eliminates much of the detail on face. Number of data points = 8031 (12% of image). RMS 
difference between original and reconstruction =11.5 gray levels. 
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Figure 11: Top left: original image. Top right: reconstruction with To = 9. Bottom left 
to right: reconstructions with tq = 12 and tq = 15. 



23 




Figure 12: Resistive fuse network for solving discrete variational problem of equation (11). 
Horizontal elements behave as linear resistors for small voltages across their terminals, but 
are open-circuits if the voltage difference is too large. 

5. Comparison of the MSV Model to the Weak Membrane 



Like the weak membrane, and other variational models, the MSV model segments 
an image into a set of piecewise smooth functions by determining the points in the 
image where the brightness function departs significantly from smoothness. Clearly, 
it is desirable to find the minimum number of such points which will result in a good 
approximation of the image. The weak membrane model formulates these goals as 
an optimization problem whose solution yields both the points on the discontinuity 
set and the piecewise smooth functions which approximate the image. 

The weak membrane has some problems associated with its formulation, however, 
which the MSV model is able to avoid. One is that the energy function, equation (1) 
which is repeated below 

E(u, r) = fi 2 ff(u- dfdxdy + / / \Vu\ 2 dxdy + v\Y\ (10) 

J JR J JR — l 

is non-convex and cannot be solved by gradient descent methods. This problem, 
which is well explained by Blake and Zisserman in [3], is intrinsic and arises because 
of the penalty which must be paid for creating a discontinuity before the system can 
reach a lower energy state. This problem does not occur in the MSV model because 
there is no feedback between the reconstruction and edge detection networks. 

Equation (10) can be discretized and modeled by a resistive network. In one- 
dimension the discrete equation is 

24 



ttv i-i fth i-i i=l 

where the {0-l}-valued variables, /,-, model the discontinuity set, T, of equation (10). 
The equivalent circuit for (11) is shown in Figure 12 [16]. The horizontal elements 
in this network are resistive fuses, which break if the voltage across their terminals 
rises above a critical value, but otherwise behave as linear resistors. Several imple- 
mentations of the 2-D version of the network in Figure 12, which differ principally in 
their design of the resistive fuse elements, have been built in VLSI [8,14,24]. Circuit 
implementations of the weak membrane cannot escape the non-convexity problem, 
however, and some effort is required to nudge them to the optimal solution [16]. 

A second problem with the weak membrane is that the optimal piecewise smooth 
functions u are determined from all of the data and not just the values adjacent to 
a discontinuity. They are also strongly determined by the scale parameter //. The 
resistive fuse network of Figure 12 and the multi-scale veto edge detection network 
of Figure 1 appear similar, because both perform smoothing by a resistive grid. 
Both the MSV model and the weak membrane reconstruct an brightness function 
from the data, but with different boundary conditions. Returning to the continuous 
formulation, if T is given in (10) then the functions u(x,y) which minimize E satisfy 
the Euler equation 



u -±.V 2 u = d (12) 

fi 2 

subject to the condition 



n 



Vu = on r (13) 



which, comparing (12) with the resistive grid equation (3), is the same as saying that 
u is a smoothed version of d. In the MSV network we generate the piecewise smooth 
reconstruction of the data by solving Laplace's equation subject to the boundary 
condition u = d on I\ This is equivalent to minimizing only the second term in (10), 
or setting \i = 0. 

Both methods can therefore be viewed as alternative ways of regularizing the 
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brightness data with interpolating splines. The solution obtained by solving (12) 
and (13), however, must tradeoff how effectively noise can be removed by smoothing 
against how natural the resulting image will be. This problem can be understood by 
considering the limiting cases: \x — > and fi — ► oc. 



As fi — > equation (12) becomes 



V 2 u«0 (14) 



The solution in this case is approximately harmonic, but due to the boundary 
condition (13), it must approach a constant, since that is the only harmonic function 
which has zero normal derivative everywhere on its boundary. In this case noise 
within the regions between the discontinuities will be completely smoothed away, 
but the resulting image will be a collection of patches of constant brightness and will 
appear very cartoon-like. 

At the other extreme, fi — ► oo, we have 



u-d&O (15) 

In this case, the output will appear more natural because it is approximately the 
same as the input, but there is also very little smoothing. 

The images reconstructed in the MSV model look more natural and are very 
similar visually to the originals because there are fewer constraints to satisfy. The 
functions only have to match the data where it is given and satisfy Laplace's equation 
everywhere else. Furthermore, noise can be more effectively removed since any feature 
which does not generate an edge is erased entirely from the reconstructed image and 
not just smoothed into the background. 

It should be noted that the weak membrane does have some features which are 
not shared by the MSV model, for instance the hysteresis property, which gives an 
existing edge the tendency to extend itself, just as a tear does in a real membrane. 
Also, the weak membrane model can be formulated as a well-defined minimization 
problem, so that one can speak of an optimal solution. We do not know of a way 
to formulate the problem that the MSV model attempts to solve, namely finding 
the minimal discontinuity set bounding piecewise harmonic functions which are good 

26 



approximations, in some sense, to the original image, as a variational problem. The 
method used in the MSV model for finding edges is a heuristic, and is based on the 
idea that the magnitude of the gradient for a harmonic surface which extends over 
any significant area can be bounded over most of its extent by a small number, such 
as the threshold used in the tests. This is seen from the fact that the functions / 
which minimize 



J j \Vf\ 2 dxdy (16) 



over some domain D are solutions to V 2 / = 0, within the domain [12]. By marking 
the points where the change in brightness is above some threshold and is significant 
over a range of spatial scales, we determine the locations where the underlying bright- 
ness function is most likely to depart from harmonicity, and where interpolation from 
neighboring values is least likely to be a good approximation to the data. In terms 
of finding the minimal discontinuity set, it is easy to show that this heuristic is not 
optimal. For instance, a steeply inclined plane will give rise to a discontinuity at 
every point on its slope, even though a plane is a harmonic function for which it 
would suffice to specify its boundary points. In practice, however, such features can- 
not occur very often because the spatial extent of a steeply sloped surface is limited 
by the dynamic range of the image. It can only rise for a few pixels before it has to 
level off. The philosophy of the MSV method is that it is better to accept a less than 
optimal heuristic than to complicate the circuit design to deal with these cases. 



6. Summary and Discussion 



We have presented a model of a two-stage analog network for edge detection and 
image reconstruction. Edges are detected in the first stage using the multi-scale veto 
rule, which states that an edge is significant only if it passes a threshold test at each 
of a set of different spatial scales. The image is reconstructed in the second stage 
from the brightness values adjacent to the edges. The two-stage design offers several 
advantages for both performance and applications. Because there is no feedback 
between the stages, there is also no problem of stability or local stationarity. Also 
since the networks are physically distinct, they do not have to be physically close 
to operate properly. This increases the flexibility with which the system may be 
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designed, as well as the types of applications for which it may be used. 

The multi-scale veto rule allows edges to be localized at the resolution of the 
smallest spatial scale without having to identify maxima in brightness gradients, 
so that second differences do not need to be computed. At the same time noise 
is removed with the efficiency of the largest scale used. The computations can be 
performed on a single network with relatively little circuitry per pixel. The simplicity 
of the circuit is an important feature of the model since it directly impacts on the 
size of the image arrays with which it can work. 

Images are reconstructed in the second stage from the brightness values adjacent 
to edges. The reconstructed images are very similar visually to the originals and 
could serve, for some applications, as acceptable replacements. Since the number of 
data points which need to be specified for the reconstruction network ranges typically 
from 15-45% of the number of pixels in the original image, depending on the amount 
of detail in the scene, and since the edge detection and reconstruction networks are 
physically distinct, this method offers possibilities for data compression. Combined 
with existing methods such as run- length and Huffman coding, the total savings in 
bandwidth may be significant. 

This paper has presented the theory behind the MSV model, which is a piece 
of ongoing research. Work is currently in progress on the design and fabrication 
of circuits for the edge detection and reconstruction networks; the design of larger 
systems for solving early vision tasks that incorporate the edge detection network; and 
on the theoretical issues concerned with applying the model to image compression. 
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